feat: add MSSQL/Fabric support to data-parity skill#705
feat: add MSSQL/Fabric support to data-parity skill#705suryaiyer95 wants to merge 18 commits intomainfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughAdds Microsoft Fabric support and SQL Server driver Azure AD token flows (cached tokens, Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Driver as SQL Server Driver
participant AzureId as `@azure/identity`
participant CLI as az CLI
participant AzureAD as Azure AD
participant MsSQL as mssql Library
App->>Driver: connect(config with Azure AD auth)
Driver->>Driver: normalize auth type & resolve resource URL
alt `@azure/identity` available
Driver->>AzureId: getToken(scope)
AzureId->>AzureAD: request token
AzureAD-->>AzureId: return token
AzureId-->>Driver: access token
else fallback to az CLI
Driver->>CLI: exec("az account get-access-token --resource <res>")
CLI->>AzureAD: request token
AzureAD-->>CLI: return token
CLI-->>Driver: token string
end
Driver->>MsSQL: new ConnectionPool(config with token)
MsSQL-->>Driver: connected pool
Driver-->>App: connected instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
5 issues found across 30 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>
<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/registry.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
403477e to
811c2be
Compare
|
@suryaiyer95 can you address code review comments? |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:90">
P2: Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:819">
P2: `crossWarehouse` is computed from unresolved params, so an omitted `source_warehouse` can be misclassified as cross-warehouse when both sides actually run on the same default warehouse.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Bun's runtime never fires native addon async callbacks, so the async `new duckdb.Database(path, opts, callback)` form would hit the 2-second timeout fallback on every connection attempt. Switch to the synchronous constructor form `new duckdb.Database(path)` / `new duckdb.Database(path, opts)` which throws on error and completes immediately in both Node and bun runtimes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly at e3df5a4. The timeout was caused by a missing duckdb .node binary, not a bun incompatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql, fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical serde mismatch where Rust engine rejects raw warehouse type names. - Update both `resolveDialect()` functions to use the mapping - Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col) - Add locale-safe date literal casting via CONVERT(DATE, ..., 23) - Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver) - Add `fabric` normalize aliases in normalize.ts - Add 15 SQL Server driver unit tests (TOP injection, truncation, schema introspection, connection lifecycle, result format) - Add 9 dialect mapping unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`: `azure-active-directory-password`, `access-token`, `service-principal-secret`, `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential` - Force TLS encryption for all Azure AD connections - Dynamic import of `@azure/identity` for `DefaultAzureCredential` - Add normalize aliases for Azure AD config fields (`authentication`, `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`) - Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES - Add 10 Azure AD unit tests covering all auth flows, encryption, and `DefaultAzureCredential` with managed identity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md - Add SQL Server / Fabric schema inspection query in Step 2 - Add "SQL Server and Microsoft Fabric" section with: - Supported configurations table (sqlserver, mssql, fabric) - Fabric connection guide with Azure AD auth types - Algorithm behavior notes (joindiff vs hashdiff selection) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter
- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
`@azure/identity` internally and creates credentials — avoids bun CJS/ESM
`isTokenCredential` boundary issue that caused "not an instance of the token
credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
`msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
columns — it stripped legitimate aliases like `_p` used by partition discovery,
causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
real MSSQL server (profile, joindiff, auto, where_clause, partitioned).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening - Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19) - Use explicit `ConnectionPool` instead of global `mssql.connect()` to isolate multiple simultaneous connections - Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g. `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string key; restore positional column values - Proper column name resolution: compare `namedKeys.length` against flattened row length, fall back to synthetic `col_0`, `col_1`, etc. - Update test mock to export `ConnectionPool` class and `createMockPool` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions
Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- P1: restrict `flattenRow` to only spread the empty-string key (`""`) where mssql merges unnamed columns, preserving legitimate array values - P2: escape single quotes in `partitionValue` for date-mode branches in `buildPartitionWhereClause` (categorical mode already escaped) - P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent password validation alongside `sqlserver`/`mssql` - P2: fallback to `"(no values)"` when `d.values` is nullish to prevent template literal coercing `undefined` to the string `"undefined"` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only the empty-string key is spread, legitimate named arrays are preserved - driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host, trustServerCertificate → trust_server_certificate) - connections: fabric type is recognized in DRIVER_MAP and listed correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server 2022+, mssql v12, and @azure/identity v4 with rationale for each - Document auth shorthands (CLI, default, password, service-principal, msi) - Move @azure/identity from dependencies to optional peerDependencies so it is NOT installed by default — only required for Azure AD auth - Add runtime check in sqlserver driver: if Azure AD auth type is requested but @azure/identity is missing, throw a clear install instruction error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution - For `azure-active-directory-default` (CLI/default auth), acquire token ourselves instead of delegating to tedious's internal `@azure/identity` - Strategy: try `DefaultAzureCredential` first, fall back to `az` CLI subprocess - Bypasses Bun resolving `@azure/identity` to browser bundle where `DefaultAzureCredential` is a non-functional stub - Also bypasses CJS/ESM `isTokenCredential` boundary mismatch - All 31 driver unit tests pass, verified against real Fabric endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken` when none supplied
The `azure-active-directory-access-token` branch passed
`token: config.token ?? config.access_token` to tedious. When neither
field was set on a connection (e.g. a `fabric-migration` entry that
declared the auth type but no token), tedious threw:
TypeError: The "config.authentication.options.token" property
must be of type string
This blocked any Fabric/MSSQL config that relied on ambient credentials
(Azure CLI / managed identity) but used the explicit
`azure-active-directory-access-token` type instead of the `default`
shorthand.
Refactor token acquisition (`DefaultAzureCredential` → `az` CLI
fallback) into a shared `acquireAzureToken()` helper used by both the
`default` path and the `access-token` path when no token was supplied.
Callers that pass an explicit token are unchanged.
Also harden `mock.module("node:child_process", ...)` in
`sqlserver-unit.test.ts` to spread the real module so sibling tests in
the same `bun test` run keep access to `spawn` / `exec` / `fork`.
Tests: 110 pass, 0 fail in `packages/drivers`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ry mode When `source` and `target` are both SQL queries, `resolveTableSources` wraps them as `__diff_source` / `__diff_target` CTEs and the executor prepends the combined `WITH …` block to every engine-emitted task. T-SQL and Fabric parse-bind every CTE body even when unreferenced, so a task routed to the source warehouse failed to resolve the target-only base table referenced inside the unused `__diff_target` CTE (and vice versa), producing `Invalid object name` errors from the wrong warehouse. Return side-specific prefixes from `resolveTableSources` alongside the combined one, and have the executor loop in `runDataDiff` pick the source or target prefix per task when `source_warehouse !== target_warehouse`. Same-warehouse behaviour is unchanged. Adds `data-diff-cte.test.ts` covering plain-name passthrough, both-query wrapping, side-specific CTE isolation, and CTE merging with engine-emitted `WITH` clauses (10 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 333a45c moved `@azure/identity` from `optionalDependencies` to `peerDependencies` with `optional: true` in `packages/drivers/package.json`, but the lockfile was not regenerated. That left CI under `--frozen-lockfile` broken and made fresh installs silently diverge from the committed state. Running `bun install` brings the lockfile in sync: `@azure/identity` is recorded as an optional peer, and its transitive pins (`@azure/msal-browser`, `@azure/msal-common`, `@azure/msal-node`) re-resolve to the versions required by `tedious` and `snowflake-sdk`, matching the reachable runtime surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52fe09e to
1977232
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
packages/opencode/src/altimate/native/connections/data-diff.ts (1)
815-820: Cross-warehouse detection is a purely textual compare.
sourceWarehouse !== targetWarehousetreatsundefinedvs an explicit name as "cross-warehouse" even when the explicit name is the default (first) warehouse. The side-specific CTE injection is a strict superset of the combined prefix, so correctness isn't affected — just a minor inefficiency (slightly narrower CTE sent). Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines 815 - 820, Summary: crossWarehouse currently does a raw textual compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs an explicit default warehouse name as cross-warehouse. Fix: canonicalize both sides before comparing by resolving undefined/empty values to the effective/default warehouse name used elsewhere (use the same resolver used for connection/effective settings) instead of comparing raw params.source_warehouse and params.target_warehouse; update the variables referenced (sourceWarehouse, targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !== resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add one) rather than inventing a new default string.packages/drivers/test/sqlserver-unit.test.ts (1)
483-489: Assertion gap lets the "mixed named + unnamed" column-naming bug slip through.This test verifies
rowsfor{ name: "alice", "": [42] }but doesn't assertresult.columns. Under the current implementation (sqlserver.ts Lines 210-217), mixing a named column with multiple unnamed values (e.g.{ name: "alice", "": [42, 100] }) collapses the header to["col_0", "col_1", "col_2"]and dropsname. Adding a case with ≥ 2 unnamed values and assertingcolumnswould have caught that and will guard whichever fix lands.♻️ Suggested additional case
test("handles mix of named and unnamed columns", async () => { mockQueryResult = { recordset: [{ name: "alice", "": [42] }], } const result = await connector.execute("SELECT * FROM t") expect(result.rows).toEqual([["alice", 42]]) }) + + test("preserves named-column header when mixed with multiple unnamed columns", async () => { + // SELECT name, COUNT(*), SUM(x) FROM t → { name: "alice", "": [42, 100] } + mockQueryResult = { + recordset: [{ name: "alice", "": [42, 100] }], + } + const result = await connector.execute("SELECT name, COUNT(*), SUM(x) FROM t") + expect(result.rows).toEqual([["alice", 42, 100]]) + expect(result.columns[0]).toBe("name") + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 483 - 489, Add a new test variant in packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name: "alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both result.rows and result.columns so the header preserves the named column "name" and includes generated column names for the unnamed values (e.g., ["name","col_1","col_2"]); this will surface the bug referenced around sqlserver.ts (logic near lines handling column mapping) and prevent regressions in connector.execute's column assembly.packages/opencode/test/altimate/data-diff-dialect.test.ts (1)
51-54: Consider expanding case-insensitivity coverage.The case-insensitivity block only exercises
sqlserverandpostgresql. Sincemssql,fabric, andmariadbgo through the same normalization path, a one-line additional assertion (e.g.,MSSQL→tsql,FABRIC→fabric) would guard against a future regression where only some branches get lowercased.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/data-diff-dialect.test.ts` around lines 51 - 54, Add assertions to the "handles uppercase input" test to cover other inputs that share the normalization path so case-insensitivity can't regress: update the test block in packages/opencode/test/altimate/data-diff-dialect.test.ts that calls warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" -> "tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected dialects map in warehouseTypeToDialect) so all branches that are lowercased are exercised.packages/drivers/src/sqlserver.ts (1)
89-100:execSyncblocks the event loop duringconnect().
execSync("az account get-access-token ...")with a 15 s timeout runs synchronously on the main thread inside anasyncfunction — any other pending I/O on the runtime is stalled while Azure CLI spins up (typically 0.5–2 s, occasionally much longer). Prefer the asyncexecFile(orpromisify(exec)) so the token fetch cooperates with the event loop.♻️ Suggested change
- try { - const { execSync } = await import("node:child_process") - const out = execSync( - "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv", - { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, - ).trim() - if (out) token = out - } catch { - // az CLI not installed or not logged in - } + try { + const { execFile } = await import("node:child_process") + const { promisify } = await import("node:util") + const run = promisify(execFile) + const { stdout } = await run( + "az", + ["account", "get-access-token", "--resource", "https://database.windows.net/", "--query", "accessToken", "-o", "tsv"], + { encoding: "utf-8", timeout: 15000 }, + ) + const out = stdout.trim() + if (out) token = out + } catch { + // az CLI not installed or not logged in + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 89 - 100, The current token fetch inside connect() uses sync child process invocation (execSync) which blocks the event loop; replace it with an asynchronous call (e.g., use child_process.execFile or promisified exec via util.promisify) so the Azure CLI call runs without blocking. Locate the block that imports execSync and sets token (the token variable and the try/catch that calls "az account get-access-token ...") and change it to call the async API (await the execFile/promisified exec) with the same args and timeout, handle stdout.trim() to set token, and preserve the existing catch behavior for missing CLI or auth failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.opencode/skills/data-parity/SKILL.md:
- Around line 453-458: The fenced code block showing the pseudo-YAML config (the
block containing type: "fabric", host:
"<workspace-id>-<item-id>.datawarehouse.fabric.microsoft.com", database:
"<warehouse-name>", authentication: "azure-active-directory-default") is missing
a language tag; change the opening fence from ``` to ```yaml (or ```text) so
markdownlint MD040 is satisfied and the block is treated as YAML.
In `@packages/drivers/package.json`:
- Around line 20-33: Remove the redundant optional peer dependency entry for
"@azure/identity" from package.json: delete the "@azure/identity" key inside the
"peerDependencies" object and its corresponding entry in "peerDependenciesMeta"
since "mssql"/"tedious" already bundles it; update the package.json so only the
other dependencies (e.g., "mssql", "oracledb", "duckdb", "mongodb",
"@clickhouse/client") remain and no "@azure/identity" references exist under
"peerDependencies" or "peerDependenciesMeta".
In `@packages/drivers/src/sqlserver.ts`:
- Around line 53-54: The code calls rawAuth.toLowerCase() without ensuring
rawAuth is a string, causing a TypeError for non-string authentication values;
update the computation of authType in the sqlserver driver to first check typeof
rawAuth === "string" before using toLowerCase and looking up AUTH_SHORTHANDS,
e.g. only apply AUTH_SHORTHANDS[rawAuth.toLowerCase()] when rawAuth is a string,
otherwise pass rawAuth through (or leave undefined) so non-string auth blocks
(config.authentication) don't blow up at runtime; adjust the logic around
rawAuth, authType, and AUTH_SHORTHANDS to be type-safe and explicit.
- Around line 56-158: The code path that handles authType starting with
"azure-active-directory" (see authType variable and mssqlConfig.authentication
assignments) does not handle unknown subtypes and can leave mssqlConfig without
authentication; add a terminal else branch after the existing
azure-active-directory-* branches that throws a clear Error (e.g. mentioning the
unsupported authType) so unknown or misspelled Azure AD subtypes fail fast;
ensure the thrown error references authType so callers get an explicit message
instead of a later opaque tedious error.
- Around line 76-87: The empty catch around the `@azure/identity` import and
credential.getToken is hiding real authentication errors; change the catch to
capture the thrown error (e.g., catch (err) { azureIdentityErr = err }) and
preserve that error in a local variable, then either append its message/details
to the final thrown error or log it at debug level when token remains undefined;
update the block that currently throws the generic "install `@azure/identity` or
run az login" message to include the captured azureIdentityErr (or its
message/stack) so real AAD/authentication failures are visible while still
tolerating missing-module/browser-bundle cases.
- Around line 208-217: The current columns logic replaces known named headers
when sampleFlat is longer (mixed named + unnamed aggregates); change the
construction in the columns branch to preserve the existing named prefix
(namedKeys) and only synthesize column names for the extra flattened entries
(e.g., generate col_i for indices >= namedKeys.length up to
sampleFlat.length-1), falling back to result.recordset.columns only when both
namedKeys and sampleFlat are empty; update the conditional around
sampleFlat/namedKeys where columns is assigned (the expression that uses
sampleFlat, namedKeys, recordset, and result.recordset.columns) so known headers
are retained and only the tail is renamed.
- Around line 162-167: Remove the fallback to the global shared pool to avoid
reintroducing cross-database isolation; in the block that currently checks
MssqlConnectionPool, ensure that if MssqlConnectionPool is falsy you throw an
explicit error instead of calling mssql.connect, i.e., keep the branch that
constructs and connects a new MssqlConnectionPool(mssqlConfig) and replace the
else branch with a throw that references MssqlConnectionPool and mssql.connect
in the message so reviewers can quickly locate and understand the change.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 154-159: The test "empty result returns correctly" uses a fake
mssql response shape by setting mockQueryResult.recordset_columns, but the
driver reads result.recordset?.columns; update the test to match real mssql
shape by removing recordset_columns and instead set mockQueryResult.recordset =
[] and, if you need to include column metadata, set
mockQueryResult.recordset.columns = {} (or leave absent for empty results) so
the driver code that checks result.recordset?.columns and the code paths in
connector.execute behave against the real response shape.
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 89-100: The current token fetch inside connect() uses sync child
process invocation (execSync) which blocks the event loop; replace it with an
asynchronous call (e.g., use child_process.execFile or promisified exec via
util.promisify) so the Azure CLI call runs without blocking. Locate the block
that imports execSync and sets token (the token variable and the try/catch that
calls "az account get-access-token ...") and change it to call the async API
(await the execFile/promisified exec) with the same args and timeout, handle
stdout.trim() to set token, and preserve the existing catch behavior for missing
CLI or auth failures.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 483-489: Add a new test variant in
packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name:
"alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both
result.rows and result.columns so the header preserves the named column "name"
and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 815-820: Summary: crossWarehouse currently does a raw textual
compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs
an explicit default warehouse name as cross-warehouse. Fix: canonicalize both
sides before comparing by resolving undefined/empty values to the
effective/default warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.
In `@packages/opencode/test/altimate/data-diff-dialect.test.ts`:
- Around line 51-54: Add assertions to the "handles uppercase input" test to
cover other inputs that share the normalization path so case-insensitivity can't
regress: update the test block in
packages/opencode/test/altimate/data-diff-dialect.test.ts that calls
warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" ->
"tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected
dialects map in warehouseTypeToDialect) so all branches that are lowercased are
exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6589a3e0-5c1a-4ac6-bd8a-e405334413ca
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.opencode/skills/data-parity/SKILL.mdpackages/drivers/package.jsonpackages/drivers/src/normalize.tspackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/native/connections/registry.tspackages/opencode/src/altimate/tools/data-diff.tspackages/opencode/test/altimate/connections.test.tspackages/opencode/test/altimate/data-diff-cte.test.tspackages/opencode/test/altimate/data-diff-dialect.test.tspackages/opencode/test/altimate/driver-normalize.test.ts
Fixes five correctness, reliability, and portability issues surfaced by the consensus code review of this branch. CRITICAL #1 — Cross-dialect partitioned diff (`data-diff.ts`): `runPartitionedDiff` built one partition WHERE clause with `sourceDialect` and passed it as shared `where_clause` to the recursive `runDataDiff`, which applied it to both warehouses identically. Cross-dialect partition mode (MSSQL → Postgres) failed because the target received T-SQL `DATETRUNC`/`CONVERT(DATE, …, 23)`. Now builds per-side WHERE using each warehouse's dialect and bakes it into dialect-quoted subquery SQL for source and target independently. The existing side-aware CTE injection handles the rest. MAJOR #2 — Azure AD token caching and refresh (`sqlserver.ts`): `acquireAzureToken` fetched a fresh token on every `connect()` and embedded it in the pool config with no refresh. Long-lived sessions silently failed when the ~1h token expired. Adds a module-scoped cache keyed by `(resource, client_id)` with proactive refresh 5 min before expiry, parsing `expiresOnTimestamp` from `@azure/identity` or the JWT `exp` claim from the `az` CLI fallback. Exposes `_resetTokenCacheForTests` for isolation. MAJOR #3 — `joindiff` + cross-warehouse guard (`data-diff.ts`): Explicit `algorithm: "joindiff"` combined with different warehouses produced broken SQL (one task referencing two CTE aliases with only one injected). Now returns an early error with a clear message steering users to `hashdiff` or `auto`. Cross-warehouse detection switched from warehouse-name string compare to dialect compare, matching the underlying SQL-divergence invariant. MAJOR #4 — Dialect-aware identifier quoting in CTE wrapping (`data-diff.ts`): `resolveTableSources` wrapped plain-table names with ANSI double-quotes for all dialects. T-SQL/Fabric require `QUOTED_IDENTIFIER ON` for this to work; default for `mssql`/tedious is ON, but user contexts (stored procs, legacy collations) can override. Now accepts source/target dialect parameters and delegates to `quoteIdentForDialect`, which was hoisted to module scope so it can be reused across partition and CTE paths. MAJOR #5 — Configurable Azure resource URL (`sqlserver.ts`, `normalize.ts`): Token acquisition hardcoded `https://database.windows.net/`, blocking Azure Government, Azure China, and sovereign-cloud customers. Now honours an explicit `azure_resource_url` config field and otherwise infers the URL from the host suffix (`.usgovcloudapi.net`, `.chinacloudapi.cn`). Adds the usual camelCase/snake_case aliases in the SQL Server normalizer. Also surfaces Azure auth error causes: if both `@azure/identity` and `az` CLI fail, the thrown error includes both hints (redacted) so users know why rather than seeing the generic "install @azure/identity or run az login" message. Tests: adds `data-diff-cross-dialect.test.ts` covering the cross-dialect partition WHERE routing and the `joindiff` guard; extends `data-diff-cte.test.ts` with dialect-aware quoting assertions for tsql, fabric, and mysql; extends `sqlserver-unit.test.ts` with cache hit / expiry refresh / client-id keyed cache tests, commercial/gov/china/custom resource URL resolution, and the combined-error-hints surface. All 41 sqlserver driver tests, 24 data-diff orchestrator tests, and 214 normalize/connections tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds first-class SQL Server (MSSQL) and Microsoft Fabric support to the data-parity/data-diff workflow, including dialect mapping to Rust engine dialect names, cross-warehouse-safe CTE injection, and expanded driver authentication/normalization/docs.
Changes:
- Add SQL Server driver capabilities (Azure AD auth flows, TOP injection, schema introspection, result shaping) and upgrade
mssqlto v12. - Add warehouse-type → Rust dialect mapping (
sqlserver/mssql→tsql,fabric→fabric, etc.) plus dialect-aware quoting and side-specific CTE injection for SQL-query mode and partitioned diffs. - Extend registry/normalization/tests/docs to recognize
fabricas a supported connection type routed through the SQL Server driver.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/opencode/test/altimate/driver-normalize.test.ts | Adds coverage ensuring fabric configs normalize using SQL Server aliasing. |
| packages/opencode/test/altimate/data-diff-dialect.test.ts | Adds tests for warehouse-type → Rust dialect mapping. |
| packages/opencode/test/altimate/data-diff-cte.test.ts | Adds tests for query detection, dialect-aware quoting, and side-specific CTE injection. |
| packages/opencode/test/altimate/data-diff-cross-dialect.test.ts | Adds tests for cross-dialect partition filtering and joindiff cross-warehouse guard behavior. |
| packages/opencode/test/altimate/connections.test.ts | Adds coverage for fabric being recognized/registered as a connection type. |
| packages/opencode/src/altimate/tools/data-diff.ts | Adjusts diff output formatting for missing diff row values. |
| packages/opencode/src/altimate/native/connections/registry.ts | Adds fabric → sqlserver driver routing. |
| packages/opencode/src/altimate/native/connections/data-diff.ts | Implements dialect mapping, dialect-aware quoting, side-specific CTE injection, and cross-dialect partition handling. |
| packages/drivers/test/sqlserver-unit.test.ts | Adds extensive unit tests for the SQL Server driver, auth flows, and result shaping. |
| packages/drivers/src/sqlserver.ts | Implements SQL Server driver with Azure AD token acquisition/caching, ConnectionPool isolation, and result shaping. |
| packages/drivers/src/normalize.ts | Adds SQL Server/Fabric config alias normalization for auth and Azure fields. |
| packages/drivers/package.json | Upgrades mssql dependency and adds optional peer dependency on @azure/identity. |
| bun.lock | Updates lockfile for mssql/tedious upgrade and peer dependency metadata. |
| .opencode/skills/data-parity/SKILL.md | Documents SQL Server/Fabric configuration, requirements, and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (const d of diffRows.slice(0, 5)) { | ||
| const label = d.sign === "-" ? "source only" : "target only" | ||
| lines.push(` [${label}] ${d.values?.join(" | ")}`) | ||
| lines.push(` [${label}] ${d.values?.join(" | ") ?? "(no values)"}`) |
There was a problem hiding this comment.
The new (no values) fallback won’t trigger for values: [] because [].join(" | ") returns an empty string (not null/undefined). If the intent is to show the placeholder when there are zero values, check d.values?.length (or similar) instead of using ?? on the joined string.
| lines.push(` [${label}] ${d.values?.join(" | ") ?? "(no values)"}`) | |
| const values = d.values?.length ? d.values.join(" | ") : "(no values)" | |
| lines.push(` [${label}] ${values}`) |
| const rows = limitedRecordset.map(flattenRow) | ||
| const sampleFlat = rows.length > 0 ? rows[0] : [] | ||
| const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : [] | ||
| const columns = | ||
| rows.length > 0 | ||
| ? Object.keys(rows[0]).filter((k) => !k.startsWith("_")) | ||
| : (result.recordset?.columns | ||
| ? Object.keys(result.recordset.columns) | ||
| : []) | ||
| const truncated = effectiveLimit > 0 && rows.length > effectiveLimit | ||
| const limitedRows = truncated ? rows.slice(0, effectiveLimit) : rows | ||
| namedKeys.length === sampleFlat.length | ||
| ? namedKeys | ||
| : sampleFlat.length > 0 | ||
| ? sampleFlat.map((_: any, i: number) => `col_${i}`) | ||
| : (result.recordset?.columns | ||
| ? Object.keys(result.recordset.columns) | ||
| : []) |
There was a problem hiding this comment.
Column name derivation becomes inconsistent when the mssql driver returns unnamed columns under the empty-string key ({"": [..]}) and there are named columns. With { name: "alice", "": [42] }, namedKeys.length === sampleFlat.length so columns becomes ["name", ""], leaving an empty column name and making it ambiguous/misaligned for callers. Consider building columns in the same loop as flattenRow (expanding the empty-string key into synthetic col_N names) so columns.length always matches rows[i].length and never contains "".
| // Cross-warehouse mode requires side-specific CTE injection: T-SQL / Fabric | ||
| // parse-bind every CTE body even when unreferenced, so sending the combined | ||
| // prefix to a warehouse that lacks the other side's base table fails at parse. | ||
| // Detect by dialect compare — more robust than warehouse-name compare, and | ||
| // matches the underlying invariant that we care about (SQL-text divergence). | ||
| const crossWarehouse = dialect1 !== dialect2 | ||
|
|
||
| // Explicit JoinDiff cannot work across warehouses: it emits one FULL OUTER | ||
| // JOIN task referencing both CTE aliases, but side-aware injection only | ||
| // defines one side per task — the other alias would be unresolved. Guard | ||
| // early so users get a clear error instead of an obscure SQL parse failure. | ||
| if (params.algorithm === "joindiff" && crossWarehouse) { |
There was a problem hiding this comment.
crossWarehouse is computed from dialect1 !== dialect2, but cross-warehouse behavior/guards should depend on whether source_warehouse and target_warehouse are different connections. If both sides are SQL Server (same dialect) but different warehouses, this will incorrectly allow algorithm: "joindiff" and can route a JOIN query to a warehouse that cannot resolve the other side’s table. Compute cross-warehouse based on warehouse identity (e.g. params.target_warehouse && params.target_warehouse !== params.source_warehouse) and keep dialect comparison only for SQL-text/dialect branching.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
packages/drivers/src/sqlserver.ts (2)
255-262:⚠️ Potential issue | 🟡 MinorFallback to
mssql.connect()still defeats per-pool isolation.This was raised in a prior review and remains unresolved. Since
packages/drivers/package.jsonnow pinsmssql@^12.0.0whereConnectionPoolis a stable named export,MssqlConnectionPoolshould never be falsy in practice; if it somehow is, silently using the global shared pool re-introduces the cross-database interference bug the explicitConnectionPoolpath was added to fix. Throw instead.🔒️ Proposed fix
- if (MssqlConnectionPool) { - pool = new MssqlConnectionPool(mssqlConfig) - await pool.connect() - } else { - pool = await mssql.connect(mssqlConfig) - } + if (!MssqlConnectionPool) { + throw new Error( + "mssql.ConnectionPool is not available — the installed `mssql` package is too old. Upgrade to mssql@^12.", + ) + } + pool = new MssqlConnectionPool(mssqlConfig) + await pool.connect()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 255 - 262, The fallback branch that calls mssql.connect() reintroduces the global shared pool bug; replace the fallback with a hard failure instead of silently using the global pool: when MssqlConnectionPool is falsy, throw a descriptive error indicating ConnectionPool is required (referencing MssqlConnectionPool, ConnectionPool, and the pool initialization logic) so callers know the driver cannot operate without the explicit ConnectionPool; do not call mssql.connect() anywhere in this module.
302-312:⚠️ Potential issue | 🟡 MinorMixed named + multi-unnamed columns still drop named headers.
Already flagged in an earlier review and not addressed. For
SELECT name, COUNT(*), SUM(x) FROM t(recordset row{ name: "alice", "": [42, 100] }),namedKeys.lengthis 2 butsampleFlat.lengthis 3, so the branch falls through to synthesizingcol_0..col_2, losing the realnameheader. Preserve the known prefix and only synthesize names for the flattened tail.♻️ Proposed fix
- const rows = limitedRecordset.map(flattenRow) - const sampleFlat = rows.length > 0 ? rows[0] : [] - const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : [] - const columns = - namedKeys.length === sampleFlat.length - ? namedKeys - : sampleFlat.length > 0 - ? sampleFlat.map((_: any, i: number) => `col_${i}`) - : (result.recordset?.columns - ? Object.keys(result.recordset.columns) - : []) + const rows = limitedRecordset.map(flattenRow) + const sampleFlat = rows.length > 0 ? rows[0] : [] + const sampleRow = recordset[0] ?? {} + const columns: string[] = [] + if (sampleFlat.length === 0) { + if (result.recordset?.columns) columns.push(...Object.keys(result.recordset.columns)) + } else { + let i = 0 + for (const [k, v] of Object.entries(sampleRow)) { + if (k === "" && Array.isArray(v)) { + for (let j = 0; j < v.length; j++) columns.push(`col_${i++}`) + } else { + columns.push(k === "" ? `col_${i}` : k) + i++ + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 302 - 312, The code currently drops known named headers when some columns are flattened (variables: limitedRecordset, flattenRow, rows, sampleFlat, namedKeys, columns), so change the columns construction to preserve the namedKeys prefix and only synthesize names for any extra flattened entries: if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length < sampleFlat.length set columns = [...namedKeys, ...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length + i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses sampleFlat or result.recordset?.columns as before. Ensure this logic replaces the current ternary chain building columns.packages/drivers/test/sqlserver-unit.test.ts (1)
176-181:⚠️ Potential issue | 🟡 Minor
recordset_columnsis still not a real mssql shape.Previously flagged; the key remains incorrect.
mssqlexposes metadata atresult.recordset.columns, not as a siblingrecordset_columns, and the driver (seesqlserver.tsline 310) readsresult.recordset?.columns. The test passes only becauserecordsetis empty, so future readers will copy the wrong pattern.💚 Proposed fix
- mockQueryResult = { recordset: [], recordset_columns: {} } + const recordset: any = [] + recordset.columns = {} + mockQueryResult = { recordset }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 176 - 181, The test constructs a fake mssql result with the wrong metadata shape—using mockQueryResult.recordset_columns instead of the real mssql shape result.recordset.columns—so update the test ("empty result returns correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build mockQueryResult with recordset: [] and an empty columns object under recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], { columns: {} })) so the driver code in sqlserver.ts (which reads result.recordset?.columns) exercises the correct shape when calling connector.execute("SELECT * FROM t").
🧹 Nitpick comments (3)
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts (1)
114-127: Strengthen the same-warehouse joindiff positive-path assertions.This test only checks
result.success === true, which passes as long as no guard fires — it won’t catch regressions where joindiff silently degrades (e.g., falls back to hashdiff, skips execution, or produces empty SQL). Since the negative test above assertssqlLog.length === 0, the symmetric positive assertion is cheap and much stronger.🧪 Suggested additional assertions
const result = await runDataDiff({ source: "dbo.orders", target: "dbo.orders_v2", key_columns: ["id"], source_warehouse: "shared", target_warehouse: "shared", algorithm: "joindiff", }) expect(result.success).toBe(true) + // Joindiff must actually dispatch SQL to the shared warehouse + expect(sqlLog.length).toBeGreaterThan(0) + expect(sqlLog.every((x) => x.warehouse === "shared")).toBe(true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines 114 - 127, The test "same-warehouse joindiff is allowed" currently only checks result.success; strengthen it by also asserting that the joindiff actually executed SQL instead of silently degrading: after calling runDataDiff (the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the request/SQL capture used in other tests) contains entries (e.g., sqlLog.length > 0) and that the executed statement or logged payload indicates joindiff behavior (e.g., contains "JOIN" or the joindiff-specific marker), so failures where joindiff falls back or produces no SQL are caught.packages/drivers/src/sqlserver.ts (2)
170-186:execSyncblocks the event loop inside an async path.The Azure CLI fallback uses synchronous
execSyncwith a 15-second timeout insideacquireAzureToken(), which will stall the entire event loop while waiting foraz account get-access-token. PreferexecFilepromisified (orspawn) so that other tasks (e.g. in-flight pool requests, other connectors initializing in parallel) continue to progress. Also, passing the command as a single formatted string throughexecSyncis a minor shell-injection concern ifresourceUrlis ever derived from untrusted input —execFilewith an argv array avoids that entirely.♻️ Suggested approach
- try { - const { execSync } = await import("node:child_process") - const out = execSync( - `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`, - { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, - ).trim() + try { + const { execFile } = await import("node:child_process") + const { promisify } = await import("node:util") + const execFileP = promisify(execFile) + const { stdout } = await execFileP( + "az", + ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"], + { encoding: "utf-8", timeout: 15000 }, + ) + const out = stdout.trim() if (out) { token = out expiresAt = parseTokenExpiry(out) } } catch (err: any) { azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 170 - 186, The use of synchronous execSync inside acquireAzureToken blocks the event loop; replace it with a non-blocking child process call (e.g., promisified execFile or spawn) to run "az" with arguments ["account","get-access-token","--resource", resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr asynchronously, enforce the 15s timeout via timers or the child process API, and on success set token and expiresAt (via parseTokenExpiry(out)); on failure capture stderr/message into azCliStderr (trim to 200 chars) and preserve existing error-handling flow so behavior of token, expiresAt, and azCliStderr remains the same.
140-204: Concurrent cache-miss calls will parallel-fetch the same token.If two
connect()calls for the same(resource, clientId)start before either populates the cache, both will independently runDefaultAzureCredential.getToken()(or theazCLI) and both write to the cache. This is wasteful for AAD throttling and surprising during app startup where several diffs connect in parallel. Caching the in-flightPromise<string>(rather than the resolved token) de-duplicates concurrent acquisitions for free — and on rejection you simply evict and let the next caller retry.const inflight = new Map<string, Promise<string>>() // inside acquireAzureToken: const existing = inflight.get(cacheKey) if (existing) return existing const p = (async () => { /* existing acquisition body */ })() .finally(() => inflight.delete(cacheKey)) inflight.set(cacheKey, p) return p🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 140 - 204, acquireAzureToken currently allows concurrent cache-miss callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string, Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of acquireAzureToken return an existing inflight.get(cacheKey) if present, otherwise wrap the entire acquisition logic in an async Promise, set inflight.set(cacheKey, promise) before awaiting it, and in promise.finally() delete the inflight entry so failed attempts can be retried; keep using tokenCache as before (check cached token first, write resolved token into tokenCache) and on rejection ensure the inflight entry is removed so subsequent callers will create a new fetch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 576-596: The "different clientIds cache separately" test currently
only asserts both connections received "shared-token" which doesn't prove
separate caching; modify the test in the test block named "different clientIds
cache separately" to either (1) make azureIdentityState.tokenOverride return
distinct tokens per call (so the first connect yields "token-1" and the second
"token-2") and assert mockConnectCalls[0].authentication.options.token and
mockConnectCalls[1].authentication.options.token differ accordingly, or (2) keep
the shared-token but assert the underlying getToken mock was invoked twice (e.g.
check the getToken invocation counter after resetMocks and the two connect()
calls equals 2) to prove cache keying is per azure_client_id; refer to
azureIdentityState.tokenOverride, resetMocks(), connect(...), and
mockConnectCalls to implement the change.
In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 27-42: The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.
---
Duplicate comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 255-262: The fallback branch that calls mssql.connect()
reintroduces the global shared pool bug; replace the fallback with a hard
failure instead of silently using the global pool: when MssqlConnectionPool is
falsy, throw a descriptive error indicating ConnectionPool is required
(referencing MssqlConnectionPool, ConnectionPool, and the pool initialization
logic) so callers know the driver cannot operate without the explicit
ConnectionPool; do not call mssql.connect() anywhere in this module.
- Around line 302-312: The code currently drops known named headers when some
columns are flattened (variables: limitedRecordset, flattenRow, rows,
sampleFlat, namedKeys, columns), so change the columns construction to preserve
the namedKeys prefix and only synthesize names for any extra flattened entries:
if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length <
sampleFlat.length set columns = [...namedKeys,
...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length +
i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses
sampleFlat or result.recordset?.columns as before. Ensure this logic replaces
the current ternary chain building columns.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 176-181: The test constructs a fake mssql result with the wrong
metadata shape—using mockQueryResult.recordset_columns instead of the real mssql
shape result.recordset.columns—so update the test ("empty result returns
correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build
mockQueryResult with recordset: [] and an empty columns object under
recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], {
columns: {} })) so the driver code in sqlserver.ts (which reads
result.recordset?.columns) exercises the correct shape when calling
connector.execute("SELECT * FROM t").
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 170-186: The use of synchronous execSync inside acquireAzureToken
blocks the event loop; replace it with a non-blocking child process call (e.g.,
promisified execFile or spawn) to run "az" with arguments
["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.
- Around line 140-204: acquireAzureToken currently allows concurrent cache-miss
callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string,
Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of
acquireAzureToken return an existing inflight.get(cacheKey) if present,
otherwise wrap the entire acquisition logic in an async Promise, set
inflight.set(cacheKey, promise) before awaiting it, and in promise.finally()
delete the inflight entry so failed attempts can be retried; keep using
tokenCache as before (check cached token first, write resolved token into
tokenCache) and on rejection ensure the inflight entry is removed so subsequent
callers will create a new fetch.
In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 114-127: The test "same-warehouse joindiff is allowed" currently
only checks result.success; strengthen it by also asserting that the joindiff
actually executed SQL instead of silently degrading: after calling runDataDiff
(the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the
request/SQL capture used in other tests) contains entries (e.g., sqlLog.length >
0) and that the executed statement or logged payload indicates joindiff behavior
(e.g., contains "JOIN" or the joindiff-specific marker), so failures where
joindiff falls back or produces no SQL are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dd24dc82-bab2-4760-ab04-8fc4b312c420
📒 Files selected for processing (6)
packages/drivers/src/normalize.tspackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/test/altimate/data-diff-cross-dialect.test.tspackages/opencode/test/altimate/data-diff-cte.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/drivers/src/normalize.ts
- packages/opencode/test/altimate/data-diff-cte.test.ts
- packages/opencode/src/altimate/native/connections/data-diff.ts
| mock.module("@altimateai/altimate-core", () => ({ | ||
| DataParitySession: class { | ||
| constructor(specJson: string) { lastSpec = JSON.parse(specJson) } | ||
| start() { return fakeStartAction } | ||
| step(_responses: string) { | ||
| return JSON.stringify({ | ||
| type: "Done", | ||
| outcome: { | ||
| mode: "diff", | ||
| diff_rows: [], | ||
| stats: { rows_table1: 0, rows_table2: 0, exclusive_table1: 0, exclusive_table2: 0, updated: 0, unchanged: 0 }, | ||
| }, | ||
| }) | ||
| } | ||
| }, | ||
| })) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find other tests that import the mocked modules and could observe the stubs.
rg -nP --type=ts -C2 '@altimateai/altimate-core|native/connections/(registry|data-diff)' packages/opencode/test
# Check whether any existing test uses mock.restore / module unmocking for these.
rg -nP --type=ts -C2 'mock\.(restore|module)' packages/opencode/testRepository: AltimateAI/altimate-code
Length of output: 41367
🌐 Web query:
Bun test mock.module scope process-level isolation across files
💡 Result:
In Bun test runner (as of 2026-04-17), mock.module does not provide automatic process-level isolation across test files. All tests run in a single shared process by default, causing mock.module calls in one test file to affect (leak to) other test files due to module cache sharing. This is a known limitation documented in open GitHub issues #12823 (Bun mocks to be scoped to test file, 63 thumbs up, active as of 2026) and #6024 (Make bun test isolated). Key evidence: - Official docs confirm single-process execution: "The test runner runs all tests in a single process." No full process isolation like Vitest/Jest --isolate. - mock.module modifies the shared ESM/CJS module caches globally, with live bindings propagating changes. - mock.restore clears function mocks/spies but explicitly does NOT restore module mocks (per docs). - Recent PR #25844 (Jan 2026) adds mock.restoreModule for manual restoration (selective or all), intended for afterEach to improve isolation within the shared process. Claims to fix #12823 but PR status unclear (not confirmed merged). - Workarounds: 1. Run test files individually: for file in *.test.ts; do bun test "$file"; done (each spawns separate process). 2. Use mock.restoreModule in afterEach if available in your Bun version. 3. spyOn on specific methods instead of full module mocks (scoped better). 4. Combine related tests into single file with describe blocks. - No --isolate flag exists; experimental --isolate=lite PR #10210 (2024) closed without merge. - For true process-level isolation across files, Bun lacks native support; use external tools or run files separately. To achieve process-level isolation: script sequential per-file execution sacrifices parallelism but ensures clean mocks per process.
Citations:
- 1: https://bun.com/docs/test/mocks
- 2: https://bun.sh/docs/test/mocks
- 3: Bun mocks to be scoped to test file oven-sh/bun#12823
- 4: Make
bun testisolated oven-sh/bun#6024 - 5: https://bun.sh/docs/test/runtime-behavior
- 6: https://bun.com/docs/test/runtime-behavior
Add module mock cleanup to prevent leakage across test files.
mock.module("@altimateai/altimate-core", …) and mock.module("…/registry", …) have no cleanup in afterEach or afterAll. In Bun, mock.module replacements are process-scoped (single shared process across all test files). Without cleanup, these stubs can leak to other test files that import @altimateai/altimate-core or the registry (e.g., data-diff-cte.test.ts, data-diff-dialect.test.ts, connections.test.ts), causing those tests to silently observe the mock implementations depending on file execution order.
mock.restore() does not restore module mocks—only function/spy mocks. Use mock.restoreModule() in afterEach (available in Bun v1.1.43+, PR #25844) to restore the mocked modules to their original exports. Alternatively, apply the pattern used in check-e2e.test.ts or post-connect-suggestions.test.ts: replace mock.module with targeted spyOn calls scoped to the specific test suite, or restructure the test to avoid module-level mocks.
Also applies to: 69–82
Same leakage risk for the registry mock at lines 69–82.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines
27 - 42, The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.
There was a problem hiding this comment.
3 issues found across 6 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:50">
P2: Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</violation>
</file>
<file name="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts">
<violation number="1" location="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts:97">
P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:870">
P1: `crossWarehouse` is derived from dialect equality, so joindiff is incorrectly allowed across different warehouses that use the same dialect.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| */ | ||
| function resolveAzureResourceUrl(config: ConnectionConfig): string { | ||
| const explicit = config.azure_resource_url as string | undefined | ||
| if (explicit) return explicit |
There was a problem hiding this comment.
P2: Normalize explicit azure_resource_url to include a trailing slash before appending .default, otherwise getToken can receive an invalid scope string.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/drivers/src/sqlserver.ts, line 50:
<comment>Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</comment>
<file context>
@@ -4,6 +4,65 @@
+ */
+function resolveAzureResourceUrl(config: ConnectionConfig): string {
+ const explicit = config.azure_resource_url as string | undefined
+ if (explicit) return explicit
+ const host = (config.host as string | undefined) ?? ""
+ if (host.includes(".usgovcloudapi.net") || host.includes(".datawarehouse.fabric.microsoft.us")) {
</file context>
| if (explicit) return explicit | |
| if (explicit) return explicit.endsWith("/") ? explicit : `${explicit}/` |
| test("returns early error when joindiff + cross-warehouse", async () => { | ||
| Registry.setConfigs({ | ||
| src: { type: "sqlserver", host: "s1", database: "d" }, | ||
| tgt: { type: "postgres", host: "s2", database: "d" }, |
There was a problem hiding this comment.
P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/test/altimate/data-diff-cross-dialect.test.ts, line 97:
<comment>Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</comment>
<file context>
@@ -0,0 +1,168 @@
+ test("returns early error when joindiff + cross-warehouse", async () => {
+ Registry.setConfigs({
+ src: { type: "sqlserver", host: "s1", database: "d" },
+ tgt: { type: "postgres", host: "s2", database: "d" },
+ })
+ const result = await runDataDiff({
</file context>
…lot)
Addresses the remaining issues raised by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer on top of the multi-model consensus fix.
### CRITICAL
- **`@azure/identity` peer dep removed** (`drivers/package.json`)
`mssql@12` → `tedious@19` bundles `@azure/identity ^4.2.1` as a regular
dependency. Declaring it here as an optional peer was redundant and caused
transitive-version-drift concerns. Users get the correct version
automatically through the tedious chain; our lazy import handles the
browser-bundle edge case itself.
### MAJOR
- **Cross-dialect date partition literal normalization** (`data-diff.ts`)
`buildPartitionDiscoverySQL` on MSSQL returns a JS `Date` object, stringified
upstream as `"Mon Jan 01 2024 …"`. `CONVERT(DATE, …, 23)` rejects that format.
Normalize `partitionValue` to ISO `yyyy-mm-dd` before dialect casting so the
T-SQL/Fabric path works end-to-end on dates discovered from MSSQL sources.
- **`crossWarehouse` uses resolved warehouse identity** (`data-diff.ts`)
Previous commit gated on dialect compare, which treated two independent
MSSQL instances as "same warehouse" and would have let `joindiff` route a
JOIN through a warehouse that can't resolve the other side's base tables.
Now resolves both sides' warehouse name (falling back to the default
warehouse when a side is omitted) and compares identities — identity-based
gating handles both the "undefined vs default" case (cubic) and the
"same-dialect, different instance" case (Copilot).
- **Drop `mssql.connect()` fallback** (`sqlserver.ts`)
`mssql@^12` guarantees `ConnectionPool` as a named export. The fallback
silently re-introduced the global-shared-pool bug this branch was added
to fix. Now throws a descriptive error if `ConnectionPool` is missing —
cross-database pool interference cannot regress.
- **Non-string `config.authentication` guarded** (`sqlserver.ts`)
Caller passing a pre-built `{ type, options }` block (or `null`) previously
crashed with `TypeError: rawAuth.toLowerCase is not a function`. Now only
applies the shorthand lookup when `rawAuth` is a string; other values pass
through so tedious can handle them or reject them with its own error.
- **Unknown `azure-active-directory-*` subtype fails fast** (`sqlserver.ts`)
Typos or future tedious subtypes previously dropped through all `else if`
branches, producing a config with `encrypt: true` but no `authentication`
block. tedious then surfaced an opaque error far from the root cause.
Now throws with the offending subtype and the supported list.
- **`execSync` replaced with async `exec`** (`sqlserver.ts`)
The `az account get-access-token` CLI fallback previously blocked the
event loop for up to 15s. Switched to `util.promisify(exec)` so the
connection path stays non-blocking.
- **Mixed named + unnamed column derivation preserves headers** (`sqlserver.ts`)
Previously `SELECT name, COUNT(*), SUM(x)` produced either `["name", ""]`
(blank header) or `["col_0", "col_1", "col_2"]` (lost `name`). Rewrote
column/row derivation to iterate in one pass, preserving known named
columns and synthesizing `col_N` only for expanded `""`-key positions.
### MINOR
- **`(no values)` fallback for empty `diff_row.values` array** (`tools/data-diff.ts`)
`[].join(" | ") ?? "(no values)"` never fires because `""` is falsy-but-not-
nullish. Gate on `d.values?.length` instead.
### Test / docs
- `sqlserver-unit.test.ts`: token-cache client-id test now counts actual
`getToken` invocations (previous version only verified both got the same
mocked token, which proved nothing about keying).
- `sqlserver-unit.test.ts`: "empty result" test now mirrors the real mssql
shape (`recordset.columns` is a property *on* the recordset array, not a
sibling key).
- `sqlserver-unit.test.ts`: added mixed-column regression tests — "name +
COUNT + SUM" and "single unnamed column" — to lock in the derivation fix.
- `sqlserver-unit.test.ts`: stubbed async `exec` via `util.promisify.custom`
so tests drive both the `execSync` legacy path and the new async path.
- `SKILL.md`: Fabric config fenced block now declares `yaml` (markdownlint
MD040).
All tests: 43/43 sqlserver driver + 238/238 opencode test suite.
Attribution: findings identified by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/drivers/src/sqlserver.ts (1)
147-218: Consider single-flight dedup for concurrent cold-cache token acquisitions.If two (or many)
connect()calls for the same(resourceUrl, clientId)race before the first token lands in the cache, each call runs the fullDefaultAzureCredential→azpath independently — exactly the "wasteful, risks throttling" scenario the cache comment flags. A small in-flight map (Map<string, Promise<string>>) next totokenCachecollapses concurrent cold requests to a single network roundtrip.♻️ Suggested approach
const tokenCache = new Map<string, { token: string; expiresAt: number }>() +const inflightTokenRequests = new Map<string, Promise<string>>()Then in
acquireAzureToken:const acquireAzureToken = async (): Promise<string> => { const cached = tokenCache.get(cacheKey) if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) { return cached.token } + const existing = inflightTokenRequests.get(cacheKey) + if (existing) return existing + const p = (async () => { + // ... existing acquisition + cache-set logic ... + return token! + })().finally(() => inflightTokenRequests.delete(cacheKey)) + inflightTokenRequests.set(cacheKey, p) + return p }Low priority — only matters when multiple pools spin up concurrently at process start — but it's cheap insurance for the documented motivation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 147 - 218, acquireAzureToken can run multiple concurrent cold-cache fetches; add a single-flight map (e.g. const inFlight = new Map<string, Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start of acquireAzureToken check if inFlight.has(cacheKey) and return the stored Promise; if not, create the Promise that performs the existing token acquisition logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight before awaiting internals, and ensure you remove(inFlight) in both fulfillment and rejection handlers so failures don't permanently block retries; keep existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.packages/drivers/test/sqlserver-unit.test.ts (1)
119-131: Optional: drop theexecSyncmock branch now that production uses asyncexec.The driver no longer calls
execSync(it was replaced withpromisify(exec)inpackages/drivers/src/sqlserver.tsLine 183-189). TheexecSyncstub here is dead mock-surface — keeping it risks implyingexecSyncis still a supported fallback path in the driver.♻️ Suggested cleanup
mock.module("node:child_process", () => ({ ...realChildProcess, - execSync: (cmd: string) => { - cliState.lastCmd = cmd - if (cliState.throwError) { - const e: any = new Error(cliState.throwError.message ?? "az failed") - e.stderr = cliState.throwError.stderr - throw e - } - return cliState.output - }, exec: execStub, }))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 119 - 131, Remove the dead execSync branch from the test mock: in the mock.module call that currently spreads realChildProcess and defines execSync and exec, delete the execSync property and its closure (the cliState handling and thrown Error creation) so only exec: execStub remains; keep the realChildProcess spread and existing execStub usage (referencing mock.module, execStub, and cliState) to reflect that production now uses the async promisified exec path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 186-200: The current call uses execAsync with a shell-interpolated
command including resourceUrl (user input), risking shell injection; replace
that usage by invoking execFile (or your promisified execFileAsync) to call "az"
with args ["account", "get-access-token", "--resource", resourceUrl, "--query",
"accessToken", "-o", "tsv"] so the value is passed as an argv element (no
shell), retain the same options (encoding, timeout) and preserve the existing
handling of stdout -> out -> token/expiresAt and azCliStderr in the catch; also
update the related test stub to mock execFile/execFileAsync instead of execAsync
or assert the argv array.
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 810-820: runPartitionedDiff currently replaces
params.source/params.target with SELECT subqueries and then calls runDataDiff,
which prevents discoverExtraColumns from recognizing real table names and causes
extra_columns to be lost; fix this by running discoverExtraColumns (and
capturing discoveredExcluded) inside runPartitionedDiff while
params.source/params.target are still plain table names, then pass the
discovered extra_columns into the recursive runDataDiff call (e.g., set
params.extra_columns and params.excluded_audit_columns or forward
discoveredExcluded) instead of letting runDataDiff attempt discovery on the
SELECT strings; ensure you do not set partition_column in the recursive call as
before.
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 147-218: acquireAzureToken can run multiple concurrent cold-cache
fetches; add a single-flight map (e.g. const inFlight = new Map<string,
Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start
of acquireAzureToken check if inFlight.has(cacheKey) and return the stored
Promise; if not, create the Promise that performs the existing token acquisition
logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight
before awaiting internals, and ensure you remove(inFlight) in both fulfillment
and rejection handlers so failures don't permanently block retries; keep
existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 119-131: Remove the dead execSync branch from the test mock: in
the mock.module call that currently spreads realChildProcess and defines
execSync and exec, delete the execSync property and its closure (the cliState
handling and thrown Error creation) so only exec: execStub remains; keep the
realChildProcess spread and existing execStub usage (referencing mock.module,
execStub, and cliState) to reflect that production now uses the async
promisified exec path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2c693281-52dc-47f7-b49c-a79a1d7331cd
📒 Files selected for processing (6)
.opencode/skills/data-parity/SKILL.mdpackages/drivers/package.jsonpackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts
✅ Files skipped from review due to trivial changes (2)
- packages/drivers/package.json
- .opencode/skills/data-parity/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opencode/src/altimate/tools/data-diff.ts
| const { stdout } = await execAsync( | ||
| `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`, | ||
| { encoding: "utf-8", timeout: 15000 }, | ||
| ) | ||
| const out = stdout.trim() | ||
| if (out) { | ||
| token = out | ||
| expiresAt = parseTokenExpiry(out) | ||
| } | ||
| } catch (err: any) { | ||
| // Capture stderr so the final error message can hint at the root cause | ||
| // (e.g. "Please run 'az login'", "subscription not found"). | ||
| azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim() | ||
| } | ||
| } |
There was a problem hiding this comment.
Prefer execFile over shell interpolation for az account get-access-token.
resourceUrl flows from config.azure_resource_url (user-supplied) directly into a shell command string. In the common case it's a well-formed URL, but a malformed/hostile value such as https://x/;other-cmd would be interpreted by the shell. Since the az invocation doesn't actually need a shell, execFile with arg array sidesteps both injection and quoting concerns and also keeps timeout/stderr semantics.
🛡️ Suggested change
- const childProcess = await import("node:child_process")
- const { promisify } = await import("node:util")
- const execAsync = promisify(childProcess.exec)
- const { stdout } = await execAsync(
- `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
- { encoding: "utf-8", timeout: 15000 },
- )
+ const childProcess = await import("node:child_process")
+ const { promisify } = await import("node:util")
+ const execFileAsync = promisify(childProcess.execFile)
+ const { stdout } = await execFileAsync(
+ "az",
+ ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"],
+ { encoding: "utf-8", timeout: 15000 },
+ )Note: the corresponding test stub at packages/drivers/test/sqlserver-unit.test.ts line 98 would need to also stub execFile (or switch the test to assert on execFileAsync's args array).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/src/sqlserver.ts` around lines 186 - 200, The current call
uses execAsync with a shell-interpolated command including resourceUrl (user
input), risking shell injection; replace that usage by invoking execFile (or
your promisified execFileAsync) to call "az" with args ["account",
"get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o",
"tsv"] so the value is passed as an argv element (no shell), retain the same
options (encoding, timeout) and preserve the existing handling of stdout -> out
-> token/expiresAt and azCliStderr in the catch; also update the related test
stub to mock execFile/execFileAsync instead of execAsync or assert the argv
array.
| const sourceSql = `SELECT * FROM ${sourceTableRef} WHERE ${sourcePartWhere}` | ||
| const targetSql = `SELECT * FROM ${targetTableRef} WHERE ${targetPartWhere}` | ||
|
|
||
| const result = await runDataDiff({ | ||
| ...params, | ||
| where_clause: fullWhere, | ||
| source: sourceSql, | ||
| target: targetSql, | ||
| // Preserve the user's shared where_clause — it's dialect-neutral. | ||
| where_clause: params.where_clause, | ||
| partition_column: undefined, // prevent recursion | ||
| }) |
There was a problem hiding this comment.
Partitioned diff silently regresses extra_columns auto-discovery to key-only.
After this rework, runPartitionedDiff replaces params.source/params.target with SELECT * FROM … subqueries before recursing into runDataDiff. Inside that recursive call, the auto-discovery at lines 938–949 invokes discoverExtraColumns(params.source, …), which bails out here because the "table name" now starts with SELECT:
// line 444
if (SQL_KEYWORDS.test(tableName)) return undefinedextraColumns stays empty, spec.config.extra_columns becomes [], and — per the comment at lines 932–934 — the Rust engine then compares key existence only and reports rows as "identical" even when non-key values differ. So whenever a user supplies partition_column without explicitly listing extra_columns, per-partition diffs silently lose value comparison.
Do the discovery once in runPartitionedDiff (where params.source is still a plain table name — already guarded at line 701) and forward the result into the recursive call:
Suggested fix
const sourceDialect = resolveDialect(params.source_warehouse)
const targetDialect = resolveDialect(params.target_warehouse ?? params.source_warehouse)
const { table1Name, table2Name } = resolveTableSources(
params.source,
params.target,
sourceDialect,
targetDialect,
)
+ // Auto-discover extra_columns up front so the recursive runDataDiff call
+ // (which sees a SELECT subquery as params.source) doesn't silently fall
+ // back to key-only comparison.
+ let discoveredExtra = params.extra_columns
+ let discoveredExcluded: string[] | undefined
+ if (!discoveredExtra || discoveredExtra.length === 0) {
+ const discovered = await discoverExtraColumns(
+ params.source,
+ params.key_columns,
+ sourceDialect,
+ params.source_warehouse,
+ )
+ if (discovered) {
+ discoveredExtra = discovered.columns
+ discoveredExcluded = discovered.excludedAudit
+ }
+ }
...
const result = await runDataDiff({
...params,
source: sourceSql,
target: targetSql,
where_clause: params.where_clause,
+ extra_columns: discoveredExtra,
partition_column: undefined, // prevent recursion
})(You may also want to surface discoveredExcluded as excluded_audit_columns on the aggregated result for parity with the non-partitioned path.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
810 - 820, runPartitionedDiff currently replaces params.source/params.target
with SELECT subqueries and then calls runDataDiff, which prevents
discoverExtraColumns from recognizing real table names and causes extra_columns
to be lost; fix this by running discoverExtraColumns (and capturing
discoveredExcluded) inside runPartitionedDiff while params.source/params.target
are still plain table names, then pass the discovered extra_columns into the
recursive runDataDiff call (e.g., set params.extra_columns and
params.excluded_audit_columns or forward discoveredExcluded) instead of letting
runDataDiff attempt discovery on the SELECT strings; ensure you do not set
partition_column in the recursive call as before.
Commit 38cfb0e removed `@azure/identity` from the drivers package's `peerDependencies` (tedious already bundles it), but the lockfile's `packages/drivers` workspace section still carried the corresponding `peerDependencies` and `optionalPeers` blocks. CI running `bun install --frozen-lockfile` would fail on the drift. Minimal edit — just removes the two stale blocks. No resolution changes (`bun install --frozen-lockfile` passes with "no changes"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:605">
P1: The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| const d = new Date(trimmed) | ||
| return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10) |
There was a problem hiding this comment.
P1: The new date parsing rewrites MySQL/MariaDB week partition values (YYYY-WW) into YYYY-MM-DD, causing partition WHERE clauses that never match.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/altimate/native/connections/data-diff.ts, line 605:
<comment>The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</comment>
<file context>
@@ -591,7 +591,21 @@ function buildPartitionWhereClause(
+ // (e.g. from Postgres, BigQuery, DATE_FORMAT MySQL output) flow through
+ // without surprising timezone shifts.
+ if (/^\d{4}-\d{2}-\d{2}(\s|T|$)/.test(trimmed)) return trimmed.slice(0, 10)
+ const d = new Date(trimmed)
+ return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10)
+ })()
</file context>
| const d = new Date(trimmed) | |
| return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10) | |
| if (!/[A-Za-z]/.test(trimmed)) return trimmed | |
| const d = new Date(trimmed) | |
| return Number.isNaN(d.getTime()) ? trimmed : d.toISOString().slice(0, 10) |
Summary
connect/execute/listSchemas/listTables/describeTable/closewith T-SQLTOPinjection,sys.*catalog queries, and row flattening for unnamed columnsdefault,password,access-token,service-principal-secret,msi-vm,msi-app-service) with shorthand aliases (cli,default,password,service-principal,msi)sqlserver/mssql→tsql,fabric→fabricwithDATETRUNC()andCONVERT(DATE, ..., 23)for date partitioningConnectionPoolisolation for concurrent connections, unnamed-column array flattening, synthetic column name fallbackKey files
packages/drivers/src/sqlserver.ts,packages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts.opencode/skills/data-parity/SKILL.mdTest plan
bun test packages/drivers/test/sqlserver-unit.test.ts— 28 tests passbun test packages/opencode/test/altimate/data-diff-dialect.test.ts— 9 tests pass🤖 Generated with Claude Code
Summary by cubic
Adds full MSSQL and Microsoft Fabric support to the data-parity skill with Azure AD auth, dialect mapping, and a
mssqlv12 upgrade. Improves cross‑warehouse SQL handling and Azure auth reliability, including token caching, sovereign cloud support, and lockfile cleanup for@azure/identity.New Features
sqlserver/mssql→tsql,fabric→fabric,postgresql→postgres,mariadb→mysql).ConnectionPoolisolation, Azure AD access‑token cache with proactive refresh, automatic resource URL inference for Gov/China withazure_resource_urloverride, and asyncazCLI fallback;@azure/identityis no longer a peer dependency (bundled viatedious).Bug Fixes
CONVERT(DATE, ..., 23)for T‑SQL/Fabric; escape single quotes.joindiffacross different warehouses with a clear error suggestinghashdiff/auto.authentication.Written for commit 3ebcec1. Summary will update on new commits.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests